Large Scale Semantic Annotation, Indexing and Search at The National Archives

نویسندگان

  • Diana Maynard
  • Mark A. Greenwood
چکیده

This paper describes a tool developed to improve access to the enormous volume of data housed at the UK’s National Archives, both for the general public and for specialist researchers. The system we have developed, TNA-Search, enables a multi-paradigm search over the entire electronic archive (42TB of data in various formats). The search functionality allows queries that arbitrarily mix any combination of full-text, structural, linguistic and semantic queries. The archive is annotated and indexed with respect to a massive semantic knowledge base containing data from the LOD cloud, data.gov.uk, related TNA projects, and a large geographical database. The semantic annotation component achieves approximately 83% F-measure, which is very reasonable considering the wide range of entities and document types and the open domain. The technologies are being adopted by real users at The National Archives and will form the core of their suite of search tools, with additional in-house interfaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Multi-Label Active Learning for Large-Scale Multimedia Annotation

Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video ann...

متن کامل

Semantics for Multimedia on the Web

The vision of the Semantic Web entails that large amounts of multimedia data should be annotated with semantic meta data. Current technology for content-based image interpretation is too limited for automated annotation of visual material. Techniques used by image search engines are also very poor and are unlikely to be improved in the near future. So, human annotations are required to make lar...

متن کامل

Guest Editorial: Big Media Data: Understanding, Search, and Mining (Part 2)

BIG media data is a new research area, and has been attracting a lot of research interests in both industry and academia. In the first part of this special issue, we have introduced three papers on large scale similar image search, image search quality improvement, and semi-supervised multi-label image annotation. This second part of this special issue includes two examples on large scale visua...

متن کامل

GATECloud.net: Web-Scale Semantic Annotation and Search Made Easy

The growth of social media and other unstructured content on one hand, and Linked Open Data on the other, now pose a significant challenge for Semantic Web researchers wishing to design, implement, and evaluate semantic annotation algorithms on web scale datasets. Running experiments on standard servers is often too time consuming, whereas implementing these efficiently with MapReduce/Hadoop re...

متن کامل

Semantic Indexing with Typed Terms Using Rapid Annotation

Illustrated with an example of a manual semantic resource for German, the use of typed index terms for semantic indexing is proposed. Types group index terms in the same semantic category and can be used by any search or cluster mechanism. To obtain semantic resources, a method to rapidly annotate large corpora is described in detail.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012